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Abstract. We present lightweight and generic symbolic methods to im- 
prove the precision of numerical static analyses based on Abstract In- 
terpretation. The main idea is to simplify numerical expressions before 
they are fed to abstract transfer functions. An important novelty is that 
these simplifications are performed on-the-fly, using information gathered 
dynamically by the analyzer. 

A first method, called "linearization," allows abstracting arbitrary expres- 
sions into affine forms with interval coefficients while simplifying them. 
A second method, called "symbolic constant propagation," enhances the 
simplification feature of the linearization by propagating assigned expres- 
sions in a symbolic way. Combined together, these methods increase the 
relationality level of numerical abstract domains and make them more 
robust against program transformations. We show how they can be in- 
tegrated within the classical interval, octagon and polyhedron domains. 
These methods have been incorporated within the Astree static ana- 
lyzer that checks for the absence of run-time errors in embedded critical 
avionics software. We present an experimental proof of their usefulness. 



1 Introduction 

Ensuring the correctness of software is a difficult but important task, especially 
in embedded critical applications such as planes or rockets. There is currently 
a great need for static analyzers able to provide invariants automatically and 
directly on the source code. As the strongest invariants are not computable in 
general, such tools need to perform sound approximations at the expense of 
completeness. In this article, we will only consider the properties of numerical 
variables and work in the Abstract Interpretation framework. A static analyzer 
is thus parameterized by a numerical abstract domain, that is, a set of computer- 
representable numerical properties together with algorithms to compute the se- 
mantics of program instructions. 

There already exist quit a few numerical abstract domains. Well-known ex- 
amples include the interval domain [5] that discovers variable bounds, and the 
polyhedron domain [8] for affine inequalities. Each domain achieves some cost 
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X <- [-10,20]; 
Y^X; 

if (Y < 0) { y < X; } 

// here, F G [0, 20] 

Fig. 1. Absolute value computation example. 

*«-[o,i]; 

[0,0.1]; 
[0,0.2]; 

T ^ {X x Y) - {X x Z) + Z; 
// here, T G [0, 0.2] 

Fig. 2. Linear interpolation computation example. 

versus precision balance. In particular, non-relational domains — e.g., the interval 
domain — are much faster but also much less precise than relational domains — 
able to discover variable relationships. Although the interval information seem 
sufficient — it allows expressing most correctness requirements, such as the ab- 
sence of arithmetic overflows or out-of-bound array accesses — relational invari- 
ants arc often necessary during the course of the analysis to find tight bounds. 
Consider, for instance, the program of Fig. 1 that computes the absolute value of 
X. We expect the analyzer to infer that, at the end of the program, Y G [0, 20]. 
The interval domain will find the coarser result Y G [—20, 20] because it cannot 
exploit the information Y = X during the test Y < 0. The polyhedron domain 
is precise enough to infer the tightest bounds, but results in a loss of efficiency. 
In our second example, Fig. 2, T is linearly interpolated between Y and Z , thus, 
we have T G [0, 0.2]. Using plain interval arithmetics, one finds the coarser result 
T G [—0.2,0.3]. As the assignment in T is not affinc, the polyhedron domain 
cannot perform any better. 

In this paper, we present symbolic enhancement techniques that can be ap- 
plied to abstract domains to solve these problems and increase their robustness 
against program transformations. In Fig. 1, our symbolic constant propagation is 
able to propagate the information Y = X and discover tight bounds using only 
the interval domain. In Fig. 2, our linearization technique allows us to prove 
that T G [0,0.3] using the interval domain (this result is not optimal, but still 
much better than T G [—0.2, 0.3]). The techniques are generic and can be applied 
to other domains, such as the polyhedron domain. However, the improvement 
varies greatly from one example to another and enhanced domains do not enjoy 
best abstraction functions. Thus, our techniques depend upon strategies, some 
of which are proposed in the article. 

Related Work. Our linearization can be related to affine arithmetics, a technique 
introduced by Vinicius et al. in [16] to refine interval arithmetics by taking into 
account existing correlations between computed quantities. Both use a symbolic 
form with linear properties to allow basic algebraic simplifications. The main 
difference is that we relate directly program variables while affinc arithmetics 
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Fig. 3. Syntax of our simple language. 



introduces synthetic variables. This allows us to treat control flow joins and 
loops, and to interact with relational domains, which is not possible with affinc 
arithmetics. Our linearization was first introduced in [13] to abstract floating- 
point arithmetics, ft is presented here with some improvements — including the 
introduction of several strategies. 

Our symbolic constant propagation technique is similar to the classical con- 
straint propagation proposed by Kildall in [11] to perform optimization. However, 
scalar constants are replaced with expression trees, and our goal is not to im- 
prove the efficiency but the precision of the abstract execution. It is also related 
to the work of Colby: he introduces, in [4], a language of transfer relations to 
propagate, combine and simplify, in a fully symbolic way, sequences of transfer 
functions. We are more modest as we do not handle disjunctions symbolically 
and do not try to infer symbolic loop invariants. Instead, we rely on the un- 
derlying numerical abstract domain to perform most of the semantical job. A 
major difference is that, while Colby's framework statically transforms the ab- 
stract equation system to be solved by the analyzer, our framework performs 
this transformation on-thc-fly and benefits from the information dynamically 
inferred by the analyzer. 

Overview of the Paper. The paper is organised as follows. In Sect. 2, we introduce 
a language — much simplified for the sake of illustration — and recall how to per- 
form a numerical static analysis parameterized by an abstract domain. Sect. 3 
then explains how symbolic expression manipulations can be soundly incorpo- 
rated within the analysis. Two symbolic methods arc then introduced: expression 
linearization, in Sect. 4, and symbolic constant propagation, in Sect. 5. Sect. 6 
discusses our practical implementation within the Astree static analyzer and 
presents some experimental results. We conclude in Sect. 7. 

2 Framework 

In this section, we briefly recall the classical design of a static analyzer using 
the Abstract Interpretation framework by Cousot and Cousot [6, 7]. This design 
is specialised towards the automatic computation of numerical invariants, and 
thus, is parameterized by a numerical abstract domain. 

2.1 Syntax of the Language 

For the sake of presentation, we will only consider in this article a very sim- 
plified programming language focusing on manipulating numerical variables. 



ix ](p) = {p(*)} 

l[a,b]j{ P ) = {a;Gl|a<x-<fe} 

[eiOeaJO) = { x o y \ x G [ei ](p), j/ G [ e 2 ](p) } og{+,-,x} 

[ei/ea](p) = { truncate(x/y) | a; 6 fei ](p), y G [e a ](p), 2/ ^ } if I = Z 

[ei/ea](p) = { i/y | x G [e^p), 2/ G Ie 2 ](p), V ^ } if I ^ Z 

flX«-eKJi) = {p[X^«] | pG i?, «6[e](p)} 

{| e ixi ? Hfi) = { P I P £ # and 3d G [e](p), dmO holds } 
Fig. 4. Concrete semantics. 



We suppose that a program manipulates only a fixed, finite set of n variables. 
V = {Vi, • • • , V n }, with values within a perfect mathematical set, I G {Z, Q, R}. 
A program P <E V(£ x inst x £) is a single control-flow graph where nodes are 
program points, in C, and arcs arc labelled by instructions in inst. We denote 
by e the entry program point. As described in Fig. 3, only two types of instruc- 
tions arc allowed: assignments (X <— expr) and tests (expr XI ?), where expr 
are numerical expressions and XI is a comparison operator. In the syntax of ex- 
pressions, classical numerical constants have been replaced with intervals [a, b] 
with constant bounds — possibly +oo or — oo. Such intervals correspond to a non- 
deterministic choice of a new value within the bounds each time the expression 
is evaluated. This will be key in defining the concept of expression abstraction in 
Sects. 3-5. Moreover, interval constants appear naturally in programs that fetch 
input values from an external environment, or when modeling rounding errors 
in floating-point computations. 

Affine forms play an important role in program analysis as they are easy 
to manipulate and appear frequently as program invariants. We enhance affine 
forms with the non-determinism of intervals by defining interval affine forms as 
the expressions of the form: [ao, bo] + ([a^, bk] x 



2.2 Concrete Semantics of the Language 

The concrete semantics of a program is the most precise mathematical expression 
of its behavior. Let us first define an environment as a function, in V — > I, 
associating a value to each variable. We choose a simple invariant semantics 
that associates to each program point I S £ the set of all environments Xi G 
V(V — ► I) that can hold when I is reached. Given an environment p G (V — ► I), 
the semantics [ expr ] (p) of an expression expr, shown in Fig. 4, is the set of values 
the expression can evaluate to. It outputs a set to account for non-determinism. 
When I = Z, the truncate function rounds the possibly non-integer result of the 
division towards an integer by truncation, as it is common in most computer 
languages. Divisions by zero are undefined, that is, return no result; for the 
sake of simplicity, we have not introduced any error state. The semantics of 
assignments and tests is defined by transfer functions | inst \ : V(V —>!)—> 
V(V — > I) in Fig. 4. The assignment transfer function returns environments where 



one variable has changed its value (p[V <— > x] denotes the function equal to p on 
V \ {V} and that maps V to x). The test transfer function filters environments 
to keep only those that may satisfy the test. We can now define the semantics 
(Xi)i£C of a program P as the smallest solution of the following equation system: 

r x e = v —>i 

\Xl= IJ whence (1) 

I (l',i,l)&P 

It describes the strongest invariant at each program point. 



2.3 Abstract Interpretation and Numerical Abstract Domains 

The concrete semantics is very precise but cannot be computed fully automat- 
ically by a computer. We will only try to compute a sound overapproximation, 
that is, a superset of the environments reached by the program. We use Abstract 
Interpretation [6, 7] to design such an approximation. 

Numerical Abstract Domains. An analysis is parameterized by a numerical ab- 
stract domain that allows representing and manipulating selected subsets of en- 
vironments. Formally it is defined as: 

— a set of computer-representable abstract elements 2?" , 

— a partial order c" on 2?" to model the relative precision of abstract elements, 

— a monotonic concretization 7 : T>* — > V(V — > I), that assigns a concrete 
property to each abstract element, 

— a greatest element T" for c" such that 7(T") = (V — ► I), 

— sound and computable abstract versions -J inst [} of all transfer functions, 

— sound and computable abstractions u" and n" of U and 

— a widening operator v" if has infinite increasing chains. 

The soundness condition for the abstraction F* : (£>")" — > 2?8 of a n— ary op- 
erator F is: F(>y(Xl), . . . , 7(X« )) C 7(F*(Xf, . . . , X\)). It ensures that F" does 
not forget any of F's behaviors. It can, however, introduce spurious ones. 

Abstract Analysis. Given an abstract domain, an abstract version (l") of the 
equation system (1) can be derived as: 

[ (i',i,l)eP 

The soundness condition ensures that any solution of (F) satisfies VZ £ 
£, 7(< ; f/) 3 Xi. The system can be solved by iterations, using a widening op- 
erator v" to ensure termination. We refer the reader to Bourdoncle [2] for an 
in-depth description of possible iteration strategics. The computed xf is almost 
never the best abstraction — if it exists — of the concrete solution X\. Unavoid- 
able losses of precision come from the use of convergence acceleration v", non- 
necessarily best abstract transfer functions, and the fact that the composition of 
best abstractions is generally not a best abstraction. This last issue explains why 
even the simplest semantics-preserving program transformations can drastically 



affect the quality of a static analysis. 

Existing Numerical Domains. There exists many numerical abstract domains. We 
will be mostly interested in those able to express variable bounds. Such abstract 
domains include the well-known interval domain [5] (able to express invariants of 
the form f\ t Vi G [a^, bi]), and the polyhedron domain [8] (able to express afhnc 
inequalities /\ 2 y\ ag Vj > (3j). More recent domains, in-between these two in 
terms of cost and precision, include the octagon domain [12] . ±Vi±Vj < Cij ) , 
the octahedron domain [3] (/\ ■ J2i a ijVi — Pi where a,j e { — 1,0, 1}), and the 
Two Variable Per Inequality domain [15] (/\j ctiVfo + PiV^ < Cj). 

3 Incorporating Symbolic Methods 

We suppose that we are given a numerical abstract domain 2?". The gist of 
our method is to replace, in the abstract transfer functions -J X <— e |}' and 
\ e XI ? J- , each expression e with another one e', in a sound way. 

Partial Order on Expressions. To define formally the notion of sound expression 
abstraction, we first introduce an approximation order < on expressions. A nat- 
ural choice is to consider the point-wise ordering of the concrete semantics [ • ] 
defined in Fig. 4, that is: e t r< e 2 Vp g (V -» I), [eij(p) C [e 2 ](p). 

However, requiring the inclusion to hold for environments is quite restrictive. 
More aggressive expression transformations can be enabled by only requiring 
soundness with respect to selected sets of environments. Our partial order < is 
now defined "up to" a set of environments R G V(V — » I): 

Definition 1 R (= ei r< e 2 Vp e i?, [ei ](p) C [e 2 ](p). 

We denote by i? \= e\ = e 2 the associated equality relation. 

Sound Symbolic Transformations. We wish now to abstract some transfer func- 
tion, e.g., | V <— e |}, on an abstract environment i?" 6 2)". The following theorem 
states that, if e' overapproximates e on 7(2?"), it is sound to replace e with e' in 
the abstract transfer functions: 

Theorem 1 If j(R*) \= e ^ e', i/ien: 

• (^^ e ^o 7 )(^)C( 7 o^^ e '^)( J R»), 

• ({ e cx ? J- o 7 )(i?tt) c (7 o { e' ex ? ^)(i? tt ). 

4 Expression Linearization 

Our first symbolic transformation is an abstraction of arbitrary expressions into 
interval affine forms io + Sfcfe x Vk), where the i's stand for intervals. 

4.1 Definitions 

Interval Affine Form Operators. We first introduce a few operators to manipulate 
interval affine forms in a symbolic way. Using the classical interval arithmetic 



operators — denoted with a X superscript — we can define point-wisely the addi- 
tion EH and subtraction B of afhne forms, as well as the multiplication M and 
division of an affine form by a constant interval: 

Definition 2 

• (io + E fc ik x V k ) EB (i' + J2 k i' k x V k ) = (i + 1 i' ) + E fc fe + x i' k ) x V k , 

• (io + E fc h x V k ) B (i + E fe i' fc x = (io - 1 i' ) + E fc fe - x i' fc ) x V fc) 

• i Kl (i + Efe U x V k ) = (i x 1 i ) + Efc (« x 1 «fe) x V k , 

• (*o + Efc*fcXVfc) i = (io^O + Ekfa/ 1 *) x Vk. 

where the interval arithmetic operators are defined classically as: 

• [a, 6] + 1 [a', 6'] = [a + a', 6 + 6'], • [a, 6] - T [a', 6'] = [a -b',b- a'], 

• [a, b] x 1 [a', 6'] = [min(aa', a6', 6a', 66'), max(aa', a6', 6a', 66')], 

• [a,6]/ J [a',6'] = 
[-oo, +oo] if e [a', 6'] 
[min(a/a', a/6', 6/a', 6/6'), max(a/a', a/6', 6/a', 6/6')] when I ^ Z 
[[min(a/a',a/6',6/a',6/6')J, [max(a/a', a/6', 6/a', 6/6')]] when I = Z 

The following theorem states that these operators are always sound and, in some 
cases, complete — i.e., ^ can be replaced by =: 

Theorem 2 For all interval affine forms l\, I2 and interval i, we have: 
. I v h h + h = h ffl h, • I V h h - h = h B l 2 , 

• I v |= i x h = i M h, if I ^ Z, • I v |= i x h < i M h, otherwise, 

• I v |= li/i = l\VSi, if I ^ Z and </ i, • I v |= Zi/i ^ ii i, otherwise. 

When I = Z, we must conservatively round upper and lower bounds respectively 
towards +00 and —00 to ensure that Thm. 2 holds. The non-exactness of the 
multiplication and division can then lead to some precision degradation. For 
instance, (X 2) M 2 evaluates to [0, 2] x X as, when computing X 2, the non- 
integral value 1/2 must be abstracted into the integral interval [0,1]. One solution 
is to perform all computations in K, keeping in mind that, due to truncation, 
l/[a,b] should be interpreted when ^ [a, 6] as (I [a, 6]) ES [—1 + x, 1 — x], 
where x = l/min(|a|, |6|). We then obtain the more precise result X + [—1, 1]. 

We now introduce a so-called "intervalization" operator, 1, to abstracts inter- 
val affine forms into intervals. Given an abstract environment, it evaluates the 
affine form using interval arithmetics. Suppose that 2?" provides us with projec- 
tion operators Tr k : 2?" — > V (I) able to return an interval overapproximation for 
each variable V k . We define t as: 

Definition 3 t(*o + £ fc (*fc x 14))(i? S ) = io + X Efc (*fc x x w k {R% 
where each TT k (R^) is an interval containing { p(V k ) \ p £ 7(i?") }■ 

The following theorem states that 1 is a sound operator with respect to i?": 



Theorem 3 j(I$) \= I ^ l(1)(I$). 



As 7Tfc performs a non-relational abstraction, i incurs a loss of precision whenever 
2?" is a relational domain. Consider, for instance i?" such that 7(2?") = { p £ 
({V U V 2 } -► [0,1]) | pC^) = p(F 2 ) }. Then, [t(y x - F 2 )(i?»)] is the constant 
function [-1,1] while [ Vi - V 2 ] is 0. 

Linearization. The linearization (|e [)(-/?") of an arbitrary expression e in an ab- 
stract environment W can now be defined by structural induction as follows: 

Definition 4 

flFD(i?») = [i,i] xv, . flM]D(fl») d = f [a, ft], 

^i+e 2 ^i?») = Je x |)(i2») EE de 2 D(i?«), 
^e 1 -e 2 [)( J R8) d = f M{R*) B (|e 2 D(i?«), 
jex/eaKif) = |ei|)(J2») i(de a I)(J2»))(J2»), 

f\e x y. e 2 \j{W) - < _ _ U v>U\(t>1\ m <^h/d^ (see beet. 4.3) 



or t (de 2 D(i?»))(ii») B dej(i?») 

The t operator is used to deal with non-linear constructions: the right argu- 
ment of a division and either argument of a multiplication are intervalized. As a 
consequence of Thms. 2 and 3, our linearization is sound: 

Theorem 4 j(R$) \= e r< deD(-R B )- 

Obviously, (| • D generally incurs a loss of precision with respect to ^. Also, d e D is 
not monotonic in its e argument. Consider for instance X/X in the environment 
i? J such that iz x {F$) = [1, +oo]. Although 7(2?") |= X/X ^ [1, 1], we do not have 
7(i? J ) (= dA/AP(i?») ■< d [1, 1] D(^ B ) as <\X/X\)(R*) = [0, 1] x X. It is important 
to note that there is no useful notion of best abstraction of expressions for <. 



4.2 Integration With a Numerical Abstract Domain 

Given an abstract domain, 2?', we can now derive a new abstract domain with 
linearization, X>JL, identical to 2?" except for the following transfer functions: 
iV^ef c (R«) d = f {V^WWUKR*) 
|eM0?}' £ (J!l) = fldeD(fl")M0?} B (i?«) 
The soundness of these transfer functions is guaranteed by Thms. 1 and 4. 

Application to the Interval Domain. As all non-relational domains, the interval 
domain [5], is not able to exploit the fact that the same variable occurs several 
times in an expression. Our linearization performs some symbolic simplification, 
and so, is able to partly correct this problem. Consider, for instance, the assign- 
ment {|y<— 3xA — A[[inan abstract environment such that X £ [a,b]. The 
regular interval domain V 1 will assign [3a — 6, 3b — a] to Y, while 2?J will assign 
[2a, 26] as d 3 x X — X 0(2?") = 2x1. This last answer is strictly more precise 
whenever a ^ b. Using the exactness of Thm. 2, one can prove that, when I ^ Z, 
the assignment in 23^ is always more precise than in V> x . This may not be the 
case for a test, or when I = Z. 

Application to the Octagon Domain. The octagon domain [12] is more precise 
than the interval one, but it is more complex. As a consequence, it is quite difficult 



to design abstract transfer functions for non-linear expressions. This problem can 
be solved by using our linearization in combination with the efficient and rather 
precise interval affine form abstract transfer functions proposed in our previous 
work [14]. The octagon domain with linearization is able to prove, for instance, 
that, after the assignment X <— T x Y in an environment such that T G [— 1, 1], 
we have -Y < X < Y. 

Application to the Polyhedron Domain. The polyhedron domain [8] is more pre- 
cise than the octagon domain but cannot deal with full interval affine forms — 
only the constant coefficient may safely be an interval. To solve this problem, 
we introduce a function \i to abstract interval affine forms further by making 
all variable coefficients singletons. For the sake of conciseness, we give a formula 
valid only for I Z and finite interval bounds: 

Definition 5 

MKM + £>*,M x v k ){R*) d " 

([a ,bo] + 1 El [K - h)/2, (b k - a k )/2) x 1 n k (R^ + £ fe ((a k + b k )/2) x V k 

\x works by "distributing" the weight b k — a k of each variable coefficient into the 
constant component, using variable bounds information from RK One can prove 
that fi is sound, that is, 7(i? tl ) |= I d: [i(l)R^ 

Application to Floating-Point Arithmetics. Real-life programming languages do 
not manipulate rationals or reals, but floating-point numbers, which are much 
more difficult to abstract. Pervasive rounding must be taken into account. As 
most classical properties of arithmetic operators are no longer true, it is generally 
not safe to feed floating-point expressions to relational domains. One solution 
is to convert such expressions into real-valued expressions by making rounding 
explicit. Rounding is highly non-linear but can be abstracted using intervals. For 
instance, X + Y in the floating-point world can be abstracted into [1 — ei, 1 + 
ei] x X + [1 — e±, 1 + ex] x Y + [—£2, £2] using small constants ei and £2 modeling, 
respectively, relative and absolute errors. This fits in our linearization framework 
which can be extended to treat soundly floating-point arithmetics. We refer the 
reader to related work [13] for more information. 

4.3 Multiplication Strategies 

When encountering a multiplication e\ x e2 and neither \ e\ D(i?") nor (| e2 D(-R" ) 
evaluates to an interval, we must intervalize either argument. Both choices are 
valid, but influence greatly the precision of the result. 

All-Cases Strategy. A first idea is to try both choices for each multiplication; we 
get a set of linearized expressions. We have no notion of greatest lower bound 
on expressions, so, we must evaluate a transfer function for all expressions in 
parallel, and take the intersection n" of the resulting abstract elements in VK 
Unfortunately, the cost is exponential in the number of multiplications in the 
original expression, hence the need for deterministic strategics that always select 
one interval affine form. 

Interval-Size Strategy. A simple strategy is to intervalize the affine form that will 



yield the narrower interval. This greedy approach tries to limit the amplitude 
of the non-determinism introduced by multiplications. The extreme case holds 
when the amplitude of one interval is zero, meaning that the sub-expression 
is semantically a constant; intervalizing it will not result in any precision loss. 
Finally, note that the relative amplitude (b — a)/\a + b\ may be more significant 
than the absolute amplitude b — a if we want to intcrvalize preferably expressions 
that are constant up to some small relative rounding error. 

Simplification-Driven Strategy. Another idea is to maximize the amount of sim- 
plification by not intervalizing, when possible, sub-expressions containing vari- 
ables appearing in other sub-expressions. For instance, in X — (Y x X), Y will 
be intervalized to yield [1 — max Y, 1 — miny] x X. Unlike the preceding greedy 
approach, this strategy is global and treats the expression as a whole. 

Homogeneity Strategy. We now consider the linear interpolation of Fig. 2. In 
order to achieve the best precision, it is important to intervalize X in both mul- 
tiplications. This yields T <— [0, 1] x Y + [0, l]xZ and we are able to prove that 
T > — however, we find that T < 0.3 while in fact T < 0.2. The interval-size 
strategy would choose to intervalize Y and Z that have smaller range than X, 
which yields the imprecise assignment T <— [—0.2,0.1] x X + [0,0.2]. Likewise, 
the simplification-driven strategy may choose to keep X that appears in two 
sub-expressions and also intervalize both Y and Z. To solve this problem, we 
propose to intcrvalize the smallest set of variables that makes the expression 
homogeneous, that is, arguments of + and — operators should have the same 
degree. In order to make the (1 — X) sub-expression homogeneous, X is inter- 
valized. This last strategy is quite robust: it keeps working if we change the 
assignment into the equivalent T <— X x Y — X x Z + Z, or if we consider 
bi-linear interpolations or interpolations with normalization coefficients. 

4.4 Concluding Remark 

Our linearization is not equivalent to a static program transformation. To cope 
with non-linearity as best as we can, we exploit the information dynamically 
inferred by the analysis: first, in the intervalization i, then, in the multiplica- 
tion strategy. Both algorithms take as argument the current numerical abstract 
environment i?". As, dually, the linearization improves the precision of the next 
computed abstract element, the dynamic nature of our approach ensures a pos- 
itive feed-back. 

5 Symbolic Constant Propagation 

The automatic symbolic simplification implied by our linearization allows us to 
gain much precision when dealing with complex expressions, without the burden 
of designing new abstract domains tailored for them. However, the analysis is 
still sensitive to program transformations that decompose expressions and in- 
troduce new temporary variables — such as common sub-expression elimination 
or register spilling. In order to be immune to this problem, one must generally 



use an expressive, and so, costly, relational domain. We propose an alternate, 
lightweight solution based on a kind of constant domain that tracks assignments 
dynamically and propagate symbolic expressions within transfer functions. 



5.1 The Symbolic Constant Domain 

Enriched Expressions. We denote by C the set of all syntactic expressions, en- 
riched with one element T c denoting 'any value.' The flat ordering C c is defined 
as X C c Y <==> Y = T c or X = Y. The concrete semantics [ • ]] of Fig. 4 is ex- 
tended to C as [T c ](p) = I. We also use two functions on expression trees: occ : 
C —> V(V) that returns the set of variables occurring in an expressing, and subst : 
CxVxC-tC that substitutes, in its first argument, every occurrence of a given 
variable by its last argument. Their definition on non— T c elements is quite stan- 
dard and we do not present it here. They are extended to C as follows: occ(T c ) = 
0, subst(e, V, T c ) equals e when V £ occ(e) and T c when V £ occ(e). 

Abstract Symbolic Environments. The symbolic constant domain is the set 
pC f§? y q restricted as follows: there must be no cyclic dependencies 
in a map S c £ T> c , that is, pair- wise distinct variables V\,...,V n such that 
Vi, Vi S occ{S c (V i+1 )) and V n 6 occ(S c (Vi)). The partial order C c on £> c is the 
point- wise extension of that on C. Each element S c £ V c represents the set of 
environments compatible with the symbolic information: 

Definition 6 7 C (5 C ) = { p e (V -> I) | Vfc, p(V k ) e [5 c (^ fc )l(p) }■ 

Main Theorem. Our approach relies on the fact that applying a substitution 
from S to any expression is sound with respect to 7 (S ): 

Theorem 5 Ve,F,5 c , 7 C (5 C ) (= e r< subst(e, V, S C (V)). 

Abstract Operators. We now define the following operators on T> c : 
Definition 7 

. Jl v ^_ « fi c ^cuT/ \ 4£f / sw^(e, V, 5 C (V)) i/ V = F fc 

. F-e|(S )(%) - { SM 6 si ( 5 C(^),l/,5 c (\/)) ijV + V k 

I I otherwise 

. s c nT c = 5 C 

Our assignment F <— e first substitutes V with 5 C (V) in S c and e before adding 
the information that V maps to the substituted e. This is necessary to remove 
all prior information on V (no longer valid after the assignment) and prevent 
the apparition of dependency cycles. As we are only interested in propagating 
assignments, tests are abstracted as the identity, which is sound but coarse. Our 
union abstraction only keeps syntactically equal expressions. This corresponds 
to the least upper bound with respect to C c . Our intersection keeps only the 
information of the left argument. All these operators respect the non-cyclicity 



condition. Note that one could be tempted to refine the intersection by mixing 
information from the left and right arguments in order to minimize the number 
of variables mapping to T c . Unfortunately, careless mixing may break the non- 
cyclicity condition. We settled, as a simpler but safe solution, to keeping the 
left argument. Finally, we do not need any widening: at each abstract iteration, 
unstable symbolic expressions are directly replaced with T c when applying U c , 
and so, become stable. 

5.2 Integration With a Numerical Abstract Domain 

Given a numerical abstract domain T>\ the domain V^ xC is obtained by com- 
bining 2?£ with V c the following way: 

Definition 8 

• c"* , Li $xC and n $xC are defined pair-wise, and V BxC = f V 8 x U c , 
. ^ xC (R^S c ) = 7 «(i?»)n 7 c (5 c ) ; 

. \ V <-et* C {RKS c ) ^ {\V ^ strat^S )^), \ V ^ ef{S c )) 

• ^e^0?f xC (R^,S c ) = ^strat(e,S c )txO?f c (Ri),^e^0 7f(S c )) 

Where strat(e, S ) is a substitution strategy that may perform sequences of sub- 
stitutions of the form f i— > subst(f, V, S (V)) in e, for any variables V . 

All information in T> c and X>" are computed independently, except that the 
symbolic information is used in the transfer functions for £>£. The next section 
discusses the choice of a strategy strat. Note that, although we chose in this 
presentation to abstract the semantics of Fig. 4, our construction can be used on 
any class of expressions, including floating-point and non-numerical expressions. 

5.3 Substitution Strategies 

Any sequence of substitutions extracted from the current symbolic constant in- 
formation is sound, but some give better results than others. As for the inter- 
valization of Sect. 4.3, we rely on carefully designed strategies. 

Full Propagation. Thanks to the non-cyclicity of elements S c S T> c , we can 
safely perform all substitutions / h- > subst(f, V, S (V)) for all V in any order, 
and reach a normal form. This gives a first basic substitution strategy. However, 
because our goal is to perform linearization-driven simplifications, it is important 
to avoid substituting with variable-free expressions or we may lose correlations 
between multiple occurrences of variables. For instance, full substitution in the 
assignment Z <— X — 0.5 x Y with the environment S c = [X i— » [0, 1], Y i— > X] 
results in Z <— [0, 1] — 0.5 x [0, 1], and so, Z 6 [—0.5, 1]. Avoiding variable-free 
substitutions, this gives Z <— X — 0.5 x X, and so, Z 6 [0,0.5], which is more 
precise. This refined strategy also succeeds in proving that Y G [0, 20] in the 
example of Fig. 1 by substituting Y with X in the test Y < 0. 

Enforcing Determinism and Linearity. Non-determinism in expressions is a ma- 
jor source of precision loss. Thus, a strategy is to avoid substituting V with 



S C (V) whenever #(IS C (V)} o 7)(A tl ) > 1. As this property is not easily com- 
puted, we propose the following sufficient syntactic criterion: S (V) should not 
be T c nor contain a non-singleton interval. This strategy gives the expected 
result in the example of Fig. 1. Likewise, one may wish to avoid substituting 
with non-linear expressions, as they must be subsequently intcrvalized, which is 
a cause of precision loss. However, disabling too many substitutions may prevent 
the linearization step to exploit correlations. Suppose that we break the last as- 
signment of Fig. 2 in three parts: U <- X x Y; V <- (1 - X) x Z; T <- U - V. 
Then, the interval domain with linearization and symbolic constant propagation 
will not be able to prove that T € [0, 0.3] unless we allow substituting, in T . U 
and V with their non-linear symbolic value. 

Gaining More Precision. More precision can be achieved by slightly altering 
the definition of T>$ xC . A simple but effective idea is to allow several strategics, 
compute several transfer functions in 2?" in parallel, and take the abstract inter- 
section n" of the results. Another idea is to perform reductions from V c to 2?" 
after each transfer function: X i is replaced with -J 14 - S c (Vk) = ? J-" (X s ) for 
some k. Reductions can be iterated to increase the precision, following Granger's 
local iterations scheme [10]. 

6 Application to the Astree Analyzer 

Astree is an efficient static analyzer focusing on the detection of run-time errors 
for programs written in a subset of the C programming language, excluding re- 
cursion, dynamic memory allocation and concurrent executions. It aims towards 
a degree of precision sufficient to actually prove the absence of run-time errors. 
This is achieved by specializing the analyzer towards specific program families, 
introducing various abstract domains, and setting iteration strategy parameters. 
Currently, the considered family of programs is that of safety, critical, embedded, 
fly-by-wire avionic software, featuring large reactive loops running for billions of 
iterations, thousands of global state variables, and pervasive floating-point arith- 
metics. We refer the reader to [1] for more detailed informations on Astree. 

Integrating the Symbolic Methods. Astree uses a partially reduced product of 
several numerical abstract domains, together with both our two symbolic en- 
hancement methods. Relational domains, such as the octagon [12] or digital 
filtering [9] domains, rely on the linearization to abstract complex floating-point 
expressions into interval affine forms on reals. The interval domain is refined by 
combining three versions of each transfer function. Firstly, using the expression 
unchanged. Secondly, using the linearized expression. Thirdly, applying symbolic 
constant propagation followed by linearization. We use the simplification-driven 
multiplication strategy, as well as the full propagation strategy — not propagating 
variable- free expressions. 

Experimental Results. We present analysis results on a several programs. All the 
analyses have been carried on an 64-bit AMD Opteron 248 (2 GHz) worksta- 
tion running Linux, using a single processor. The following table compares the 
precision and efficiency of Astree before and after enabling our two symbolic 



methods: 





without enhancements 


with enhancements 


code size 
in lines 


analysis 
time 


nb. of 
iters. 


memory 


alarms 


analysis 
time 


nb. of 
iters. 


memory 


alarms 


370 


1.8s 


17 


16 MB 





3.1s 


17 


16 MB 





9 500 


90s 


39 


80 MB 


8 


160s 


39 


81 MB 


8 


70 000 


2h 40mn 


141 


559 MB 


391 


Hi 16mn 


44 


582 MB 





226 000 


llh 16mn 


150 


1.3 GB 


141 


6h 36mn 


86 


1.3 GB 


1 


400 000 


22h 9mn 


172 


2.2 GB 


282 


13h 52mn 


96 


2.2 GB 






The precision gain is quite impressive as up to hundreds of alarms are re- 
moved. In two cases, this increase in precision is sufficient to achieve zero alarm, 
which actually proves the absence of run-time errors. Moreover, the increase in 
memory consumption is negligible. Finally, in our largest examples, our enhance- 
ment methods save analysis time: although each abstract iteration is more costly 
(up to 25%) this is compensated by the reduced number of iterations needed to 
stabilize our invariants as a smaller state space is explored. 

Discussion. It is possible to use the symbolic constant propagation also in rela- 
tional domains, but this was not needed in our examples to remove alarms. Our 
experiments show that, even though the linearization and constant propagation 
techniques on intervals are not as robust as fully relational abstract domains, 
they are quite versatile thanks to their parametrization in terms of strategies, 
and much simpler to implement than even a simple relational abstract domain. 
Moreover, our methods exhibit a near-linear time and memory cost, which is 
much more efficient than relational domains. 

7 Conclusion 

We have proposed, in this article, two techniques, called linearization and sym- 
bolic constant propagation, that can be combined together to improve the preci- 
sion of numerical abstract domains. In particular, we are able to compensate for 
the lack of non-linear transfer functions in the polyhedron and octagon domains, 
and for a weak or inexistcnt level of rclationality in the octagon and interval 
domains. Finally, they help making abstract domains robust against program 
transformations. Thanks to their parameterization in terms of strategies, they 
can be finely tuned to take into account semantics as well as syntactic program 
features. They are also very lightweight in terms of both analysis and develop- 
ment costs. We found out that, in many cases, it is easier and faster to design a 
couple of linearization and symbolic propagation strategies to solve a local loss of 
precision in some program, while keeping the interval abstract domain, than to 
develop a specific relational abstract domain able to represent the required local 
properties. Strategies also proved reusable on programs belonging to the same 
family. Practical results obtained within the Astree static analyzer show that 
our methods both increase the precision and save analysis time. They were key 
in proving the absence of run-time errors in real-life critical embedded avionics 



software. 

Future Work. Because the precision gain strongly depends upon the multiplica- 
tion strategy used in our linearization and the propagation strategy used in the 
symbolic constant domain, a natural extension of our work is to try and design 
new such strategics, adapted to different practical cases. A more challenging task 
would be to provide theoretical guarantees that some strategies make abstract 
domains immune to given classes of program transformations. 

Acknowledgments. We would like to thank all the former and present members 
of the Astree team: B. Blanchct, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, 
D. Monniaux and X. Rival. We would also like to thank the anonymous referees 
for their useful comments. 

References 

[1] B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Mine, D. Monniaux, 
and X. Rival. A static analyzer for large safety-critical software. In ACM PLDI'03, 
volume 548030, pages 196-207. ACM Press, 2003. 

[2] F. Bourdoncle. Efficient chaotic iteration strategies with widenings. In FMPA '93, 
volume 735 of LNCS, pages 128-14. Springer, 1993. 

[3] R. Clariso and J. Cortadella. The octahedron abstract domain. In SAS'04, volume 
3148 of LNCS, pages 312-327. Springer, 2004. 

[4] C. Colby. Semantics-Based Program Analysis via Symbolic Composition of Trans- 
fer Relations. PhD thesis, School of Computer Science, Carnegie Mellon Univer- 
sity, Pittsburgh, PA, USA, 1996. 

[5] P. Cousot and R. Cousot. Static determination of dynamic properties of programs. 
In ISOP'76, pages 106-130. Dunod, Paris, France, 1976. 

[6] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for 
static analysis of programs by construction or approximation of fixpoints. In 
ACM POPL'77, pages 238-252. ACM Press, 1977. 

[7] P. Cousot and R. Cousot. Abstract interpretation and application to logic pro- 
grams. Journal of Logic Programming, 13(2-3):103-179, 1992. 

[8] P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among 
variables of a program. In ACM POPL'78, pages 84-97. ACM Press, 1978. 

[9] J. Feret. Static analysis of digital filters. In ESOP'04, volume 2986 of LNCS. 
Springer, 2004. 

[10] P. Granger. Improving the results of static analyses programs by local decreasing 

iteration. In FSTTCS, volume 652 of LNCS, pages 68-79. Springer, 1992. 
[11] G. Kildall. A unified approach to global program optimization. In ACM POPL '73, 

pages 194-206. ACM Press, 1973. 
[12] A. Mine. The octagon abstract domain. In AST 2001 in WCRE 2001, IEEE, 

pages 310-319. IEEE CS Press, 2001. 
[13] A. Mine. Relational abstract domains for the detection of floating-point run-time 

errors. In ESOP'04, volume 2986 of LNCS, pages 3-17. Springer, 2004. 
[14] A. Mine. Weakly Relational Numerical Abstract Domains. PhD thesis, Ecole 

Polytechnique, Palaiseau, France, dec 2004. 
[15] A. Simon, A. King, and J. Howe. Two variables per linear inequality as an abstract 

domain. In LOPSTR'02, volume 2664 of LNCS, pages 71-89. Springer, 2002. 
[16] M. Vimcius, A. Andrade, J. L. D. Comba, and J. Stolfi. Affine arithmetic. In 

INTERVAL '94, 1994. 



